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(54) Sequence-determined DNA fragments and corresponding polypeptides encoded thereby 



(57) The present invention provides DNA molecules 
that constitute fragments of the genome of a plant, and 
polypeptides encoded thereby. The DNA molecules are 
useful for specifying a gene product in cells, either as a 
promoter or as a protein coding sequence or as an UTR 
or as a 3' termination sequence, and are also useful in 
controlling the behavior of a gene in the chromosome, 



in controlling the expression of a gene or as tools for 
genetic mapping, recognizing or isolating identical or re- 
lated DNA fragments, or identification of a particular in- 
dividual organism, or for clustering of a group of organ- 
isms with a common trait. 

°Arabidopsis DNA is used in the present experi- 
ment, but the procedure is a general one. 
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[0924] 313. Ion transport protein 

[0925] This family contains Sodium, Potassium, Calcium ion channel This family is 6 transmembrane helices in which 
the last two helices flank a loop which determines ion selectivity. In some sub-families (e.g. Na channels) the domain 
is repeated four times, whereas in others (e.g. K channels) the protein forms as a tetramer in the membrane. A bacterial 
io structure of the protein is known for the last two helices but is not the Ram family due to it lacking the first four helices 
[0926] 314. Isocitrate and isopropylmaiate dehydrogenases signature (isodh) 

Isocitrate dehydrogenase (IDH) [1 ,2] is an important enzyme of carbohydrate metabolism which catalyzes the oxidative 
decarboxylation of isocitrate into alpha-ketoglutarate. IDH is either dependent on NAD+ (EC 1.1.1.41 ) or on NADP+ 
(EC 1.1.1.42 V In eukaryotes there are at least three isozymes of IDH: two are located in the mitochondrial matrix (one 

is NAD+-dependent, the other NADP+-dependent), while the third one (also NADP+-dependent) is cytoplasmic. In Es- 
cherichia coli the activity of a NADP+-dependent form of the enzyme is controlled by the phosphorylation of a serine 
residue; the phosphorylated form of IDH is completely inactivated. 3-isopropylmalate dehydrogenase (EC 1.1.1.85) 
(IMDH) [3,4] catalyzes the third step in the biosynthesis of leucine in bacteria and fungi, the oxidative decarboxylation 
of 3-isopropylmalate into 2-oxo-4-methy (valerate. Tartrate dehydrogenase (EC 1.1.1.93 ) [5] catalyzes the reduction of 

20 tartrate to oxaloglycolate. These enzymes are evolutionary related [1,3,4,5]. The best conserved region of these en- 
zymes is a glycine-rich stretch of residues located in the C-terminal section. This region was used as asignature pattern. 
[0927] Consensus pattern: [NS]-[LIN/rYTHFYDN]-G-[D^ 
[LIVMPA]-G-[LIVMF]- 

25 [ 1] Hurley J.H. t Thorsness P.E.. Ramalingam V., Helmers N.H., Koshland D.E. Jr., Stroud RM. Proc. Natl. Acad. 

Sci. U.S.A. 86:8635-8639(1989). 

[ 2] Cupp J.R, McAlister-Henn L. J. Biol. Chem. 266:22199-22205(1991). 

[ 3] Imada K., Sato M., Tanaka N., Katsube Y, Matsuura Y, Oshima T. J. Mol. Biol. 222:725-738(1991). 
[ 4] Zhang I, Koshland D.E. Jr. Protein Sci. 4:84-92(1 995). 
so [ 5] Tipton P.A., Beecher B.S. Arch. Biochem. Biophys. 313:15-21 (1994). 

[0928] 315. Jacalin-like lectin domain. \ 
[0929] Proteins containing this domain are lectins. It is found in 1 to 6 copies in these proteins. The domain is also 
found in the animal prostatic spermine-binding protein (Swiss: P 15501 ). 
35 [0930] [1] Sankaranarayanan R, Sekar K, Banerjee R, Sharma V, Surolia A, Vijayan M; Nat Struct Biol 1996;3: 
596-603. 

[0931] 316. KH domain 

[0932] KH motifs probably bind RNA directly. Auto antibodies to Nova, a KH domain protein, cause paraneoplastic 
opsoclonus ataxia. 

40 

[1] Burd CG, Dreyfuss G, Science 1994;265:615-621 . 

[2] Musco G, Stier G, Joseph C, Castiglione Morelli MA, Nilges M, Gibson TJ, Pastore A, Cell 1996;85:237-245. 
[0933] 317. Kelch motif 

45 [0934] The kelch motif was initially discovered in Kelch (Swiss:Q04652V In this protein there are six copies of the 
motif. It has been shown that Swiss:Q04652 is related to Galactose Oxidase [1] for which a structure has been solved 
[2]. The kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller structure as found 
in neur, 

[0935] [1] Bork P, Doolrttle RF, J Mol Biol 1 994;236: 1 277-1 282. [2] Ito N, Phillips SE, Stevens C f Ogel ZB, McPherson 
so MJ, Keen, JN, Yadav KD, Knowles PR Nature 1991;350:87-90. 

[0936] 318. Soybean trypsin inhibitor (Kunitz) protease inhibitors family signature 

[0937] The soybean trypsin inhibitor (Kunitz) family [1] is one of the numerous families of proteinase inhibitors. It 
comprise plant proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families, 
thiol proteinases and aspartic proteinases as well as some proteins that are probably involved in seed storage. This 
5 $ family is currently known to group the following proteins: - Trypsin inhibitors A, B, C, KTI1 , and KTI2 from soybean. - 
Trypsin inhibitor DE3 from coral beans (Erythrina sp.). - Trypsin inhibitor DE5 from sandal bead tree. - Trypsin inhibitors 
1 A (WTI-1 A), 1B (WTI-1B), and 2 (WTI-2) from goa bean. - Trypsin inhibitor from Acacia confusa. - Trypsin inhibitor 
from silk tree. - Chymotrypsin inhibitor 3 (WCI-3) from goa bean. - Cathepsin D inhibitors PDI and NDI from potato [2], 
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which inhibit both cathepsin O (aspartic proteinase) and trypsin. - Alpha-amylase/subtilisin inhibitors from barley and 
wheat. - Albumin-1 (WBA-1 ) from goa bean seeds [3]. - Miraculin from Richadella dulcifica [4], a sweet taste protein. 
- Sporamin from sweet potato [5], the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 (P340) from 
potato tuber [6], - Wound responsive protein gwin3 from poplar tree [7]- - 21 Kd seed protein from cocoa [8]. All these 
5 proteins contain from 170 to 200 amino acid residues and one or twointrachain disulfide bonds. The best conserved 
region is found in their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [LIVM]-x-D-x-[EDNTY]-[DG]-[RKHDENQ]-x-[LIVM]-x(5)-Y-x-[LIVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 
io [ 2] Ritonja A., Krizaj l. ( Mesko P., Kopitar M. ( Lucovnik P., Strukelj B. ( Pungercar J., Buttle D.J.. Barrett A.J., Turk 

V. FEBS Lett. 267:13-15(1990). 

[ 3] Kortt A.A., Strike P.M., de Jersey J. Eur. J. Biochem. 181 :403-408(1989). 

[ 4] Theerasilp S„ Hitotsuya H., Nakajo S., Nakaja K, Nakamura Y, Kurihara Y. J. Biol. Chem. 264:6655-6659 
(1989). 

is [ 5] Hattori T, Yoshida N., Nakamura K. Plant Mol. Biol. 1 3:563-572(1 989). 

[ 6] Krizaj L, Drobnic-Kosorok M., Brzin J., Jerala R., Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D., Hollick J.B., Parsons T.J., Clarke H.R.G., Gordon M.R Plant Mol. Biol. 14:51-59(1989). 

[ 8] Tai H„ McHenry L. f Fritz P.J., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

20 [0939] 31 9. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1 ] is the enzyme that catalyzes the condensation of malonyl-ACP with the growing 
fatty acid chain It is found as a component of the following enzymatic systems: - Fatty acid synthetase (FAS), which 
catalyzes the formation of long-chain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chlo- 
roplast FAS are composed of eight separate subunits which correspond to different enzymatic activities; beta-ketoacyl 

25 synthase is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the beta- 
ketoacyl synthase domain is located in the C-terminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional chain- the beta-ketoacyl synthase domain is located in the N-terminal section [2]. - The multifunctional 6-meth- 
ysalicylic acid synthase (MSAS) from Penicillium patulum [3]. This is a multifunctional enzyme involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-terminal section. - Polyketide antibiotic synthase 

30 enzyme systems. Potyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids KAS is one of the components involved in the biosynthesis of the Streptomyces polyketide antibiotics granatacin 
[4] tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Wa. Wa is involved in the 
biosynthesis of conidial green pigment. Wa is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 

35 chain - Yeast mitochondrial protein CEM1. The condensation reaction is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
malonyl donor with the concomitant release of carbon dioxide. The sequence around the active site cysteine is well 
conserved and can be used as a signature pattern. 

[0940] Consensus pattern: G-x(4)-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF] [C is the active site resi- 
40 due] 

[ 1] Kauppinen S., Siggaard-Andersen M., von Wettstein-Knowles P. Carlsberg Res. Commun. 53:357-370(1988). 
[ 2] Witkowski A., Rangan VS., RandhawaZ.L, Amy CM., Smith S. Eur. J. Biochem. 198:571-579(1991). 
[ 3] Beck J., Ripka S., Siegner A., Schiltz E., Schweizer E. Eur. J. Biochem. 192:487-498(1990). 
45 i 4] Bibb M J , Biro S., Motamedi H., Collins J.F., Hutchinson C.R. EM BO J. 8:2727-2736(1989). 

[ 5] Sherman D.H., Malpartida R, Bibb M.J., Kieser H.M., Bibb M.J., Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1 2,3] is a microtubule-associated force-producing protein that mayplay a role in organelle transport. Kinesin 
so is an oligomeric complex composedof two heavy chains and two light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end.The heavy chain is composed of three structural domains: a large globular N-terminal 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATP, to bind and move on micro- 
tubules) a central alpha-helical coiled coil domain that mediates the heavy chain dimerization; and asmall globular C- 
terminal' domain which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
55 ganelles A number of proteins have been recently found that contain a domain similarto that of the kinesin 'motor' 
domain [1 4 El]- - Drosophila claret segregational protein (ncd). Ned is required for normal chromosomal segregation 
in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophila kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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1/1 - (C) PILE REGISTRY 

RN - 301764-80-7 REGISTRY 

OTHER NAMES.t n (Arabid °P sis tJialiana Cl one Ceres_1715205) (9CI) (CA INDEX NAME 

S? " gj 8 = PN:|EP1033405 SEQID: 34187 1 claimed protein 
PS - PROTEIN SEQUENCE 
SQL - 421 

BQ 1 L I VQWLREKR VKKHMASLPL GPQPHALAPP LQLHDGDALK RRPELD SDKS 

51 MSAAVIEQND AVTGHI I STT IGGKNGEPKQ TISYMAERW GTGSPGIVFQ 
101 AKCLETGESV AIKKVLQDRR YKNRELQLMR PMDHPNVISL KHCPFSTTSR 
151 DELFLNLVME YVPETLYRVL RKYTS SNQRM PIFYVKLYTY QIFRGLAYIH 
201 TVPGVCHRDV KPQNLLVDPL THQVKLCDFG SAKVLVKGEP NISYICSRYY 
251 RAPELIFGAT EYTASIDIWS AGCVLAELLL GQPLPPGENS VDQLVEIIKV 
301 LGTPTREE I R CMNPNYTDFR FPQIKAHPWH KVFHKRMPPE AIDLASRLLQ 
351 YSPSLRCTAL EACAHPFFNE LREPNARLPN GRPLPPLFNF KOELGGASME 
401 LINRLIPEHV RRQMSTGLQN S 



SQ3 



1 Leu- lie -Val-Gln-Trp-Leu-Arg-Glu-Lys -Arg- 
il Val-Lys-Lys-His-Met-Ala-Ser-Leu-Pro-Leu- 
21 Gly-Pro-Gln-Pro-His-Ala-Leu-AZa-Pro-Pro- 
31 Leu-Gln-Leu-His-Asp-Gly-Asp-Ala-Leu-Lys- 
41 Arg-Arg-Pro-Glu-Leu-Asp-Ser-Asp-Lys-Glu- 
51 Met-Ser-Ala-Ala-Val-Ile-Glu-Gly-Asn-Asp- 
61 Ala-Val-Thr-Gly-His-Ile-Ile-Ser-Thr-Thr- 
71 Ile-Gly-Gly-Lys-Asn-Giy-Glu-Pro-Lys-Gln- 
81 Thr-lle-Ser-Tyr-Met-Ala-Glu-Arg-Val-Val- 
91 Gly-Thr-Gly-Ser-Phe-Gly-Ile-Val-Phe-Gln- 
101 Ala-Lys-Cys-Leu-Glu-Thr-Gly-Glu-Ser-Val- 
111 Ala-Ile-Lys-Lys-Val-Leu-Gln-Asp-Arg-Arg- 
121 Tyr-Lys-Asn-Arg-Glu-Leu-Gln-Leu-Met-Arg- 
131 Pro-Met -Asp-His-Pro-Asn-Val-Ile-Ser-Leu- 
141 Lys-His-Cys-Phe-Phe-Ser-Thr-Thr-Ser-Arg- 
151 Asp-Glu-Leu-Phe-Leu-Asn-Leu-Val-Met-Glu- 
161 Tyr-Val-Pro-Glu-Thr-Leu-Tyr-Arg-Val-Leu~ 
171 Arg-His-Tyr-Thr-Ser-Ser-Asn-Gln-Arg-Met- 
181 Pro-Ile-Phe-Tyr-Val-Lys-Leu-Tyr-Thr-Tyr- 
191 Gln-lle-Phe-Arg-Gly-Leu-Ala-Tyr-lle-His- 
201 Thr-Val-Pro-Gly-Val-Cys-His-Arg-Asp-Val- 
211 Lys-Pro-Gln-Asn-Leu-Leu-Val-Asp-Pro-Leu- 
221 Thr-His-Gln-Voa-Lys-Leu-Cys-Asp-Phe-Gly- 
231 Ser -Ala - Ly s - Val - Leu - Val - Lys -Gly - Glu - Pro - 
241 Asn-Ile-Ser-Tyr-Ile-Cys-Ser-Arg-Tyr-Tyr- 
251 Arg-Ala-Pro-Glu-Leu-Ile-Phe-Gly-Ala-Thr- 
261 Glu-Tyr-Thr-Ala-Ser-Ile-Asp-Ile-Trp-Ser- 
271 Ala-Gly-Cys-Val-Leu-Ala-Glu-Leu-Leu-Leu- 
281 Gly-Gln- Pro- Leu- Phe -Pro-Gly-Glu - Asn- Ser- 
291 Val-Asp-Gln-Leu-Val-Glu-Ile-Ile-Lys-Val- 
301 Leu-Gly-Thr-Pro-Thr-Arg-Glu-Glu-Ile-Arg- 
311 Cys-Met-Asn-Pro-Asn-Tyr-Thr-Asp-Phe-Arg- 
321 Phe-Pro-Gln-Ile-Lys-Ala-His-Pro-Trp-His- 
331 Lys-Val-Phe-His-Lys-Arg-Met-Pro-Pro-Glu- 



t 



341 Ala- Il^e- Asp-Leu- Ala 
351 Tyr-Ser-Pro-Ser-Leu 
361 Glu-Ala-Cys-Ala-His 
371 Leu-Arg-Glu-Pro-Asn 
381 Gly-Arg-Pro-Leu-Pro 
391 Lys-Gln-Glu-Leu-Gly 
401 Leu-Ile-Asn-Arg-Leu 
411 Arg-Arg-Gln-Met-Ser 
421 Ser 



-Ser-Arg-Leu-Leu-Gln- 
-Arg-Cys-Thr-Ala-Leu- 
- Pro - Phe - Phe -Asn-Glu- 
-Ala-Arg-Leu-Pro-Asn- 
- Pro - Leu - Phe - Asn- Phe - 
-Gly-Ala-Ser-Met-Glu- 
-Ile-Pro-Glu-His-Val- 
-Thr-Gly-Leu-Gln-Asn- 
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The present invention provides DNA mols. that constitute fragments of the 
genome and cDNAs from Zea mays mays (HYBRID SEED #35A19) and Arabidopsis 
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thaliana (ecotype Wassilewski) , and polypeptides encoded thereby. The DNA 
mols. are useful for specifying a gene product in cells, either as a 
promoter ^ or as a protein coding sequence or as an UTR or as a 3 1 
termination sequence, and are also useful in controlling the behavior of a 
gene in the chromosome, in controlling the expression of a gene or as 
tools for genetic mapping, recognizing or isolating identical or related 
DNA fragments, or identification of a particular individual organism, or 
for clustering of a group of organisms with a common trait. A?:abidopsis 
DNA is used in the present expt., but the procedure is a general one. 
Protocols are provided for Southern hybridizations and transformation of 
carrot cells. [This abstr. record is one of 15 records supplemental to 
CA13316218528Q necessitated by the large no K of index entries required to 
fully index the document and publication system constraints.] . 
com Arabidopsis cDNA genome protein sequence 
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